Search CORE

6,254 research outputs found

Protein-RNA interactions: a structural analysis

Author: Berman HM
Daley DT
Jones S
Luscombe NM
Thornton JM
Publication venue
Publication date: 01/01/2001
Field of study

A detailed computational analysis of 32 protein-RNA complexes is presented. A number of physical and chemical properties of the intermolecular interfaces are calculated and compared with those observed in protein-double-stranded DNA and protein-single-stranded DNA complexes. The interface properties of the protein-RNA complexes reveal the diverse nature of the binding sites. van der Waals contacts played a more prevalent role than hydrogen bond contacts, and preferential binding to guanine and uracil was observed. The positively charged residue, arginine, and the single aromatic residues, phenylalanine and tyrosine, all played key roles in the RNA binding sites. A comparison between protein-RNA and protein-DNA complexes showed that whilst base and backbone contacts (both hydrogen bonding and van der Waals) were observed with equal frequency in the protein-RNA complexes, backbone contacts were more dominant in the protein-DNA complexes. Although similar modes of secondary structure interactions have been observed in RNA and DNA binding proteins, the current analysis emphasises the differences that exist between the two types of nucleic acid binding protein at the atomic contact level

Crossref

UCL Discovery

PubMed Central

Sussex Research Online

Characterization of the definitive classical calpain family of vertebrates using phylogenetic, evolutionary and expression analyses

Author: Benton MJ
Fieller EC
Hall TA
Jones DT
Muse SV
Publication venue: 'The Royal Society'
Publication date: 09/04/2014
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Crossref

PubMed Central

Edinburgh Research Explorer

Accurate contact predictions using covariation techniques and machine learning.

Author: Jones DT
Kosciolek T
Publication venue
Publication date: 01/09/2016
Field of study

Here we present the results of residue-residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2015. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc

UCL Discovery

PubMed Central

Computational Methods for Annotation Transfers from Sequence

Author: Cozzetto D
Jones DT
Publication venue: Springer New York
Publication date: 03/11/2016
Field of study

Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches

UCL Discovery

DISOPRED3: Precise disordered region predictions with annotated protein binding activity

Author: Cozzetto D
Jones DT
Publication venue
Publication date: 13/11/2014
Field of study

Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level. Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM-based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools. Availability: http://bioinf.cs.ucl.ac.uk/disopred

UCL Discovery

Learning a Functional Grammar of Protein Domains using Natural Language Word Embedding Techniques

Author: Buchan DW
Jones DT
Publication venue
Publication date: 01/04/2020
Field of study

In this paper, using word2vec, a widely-used natural language processing method, we demonstrate that proteins domains may have a learnable implicit semantic "meaning" in the context of their functional contributions to multi-domain proteins in which they are found. Word2vec is a group of models which can be used to produce semantically meaningful embeddings of words or tokens in a fixed-dimension vector space. In this work, we treat multi-domain proteins as "sentences" where domain identifiers are tokens which may be considered as "words". Using all InterPro [1] pfam domain assignments we observe that the embedding could be used to suggest putative GO assignments for Pfam [2] Domains of Unknown Function. This article is protected by copyright. All rights reserved

UCL Discovery

Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins

Author: Greener JG
Jones DT
Publication venue
Publication date: 02/09/2021
Field of study

Finding optimal parameters for force fields used in molecular simulation is a challenging and time-consuming task, partly due to the difficulty of tuning multiple parameters at once. Automatic differentiation presents a general solution: run a simulation, obtain gradients of a loss function with respect to all the parameters, and use these to improve the force field. This approach takes advantage of the deep learning revolution whilst retaining the interpretability and efficiency of existing force fields. We demonstrate that this is possible by parameterising a simple coarse-grained force field for proteins, based on training simulations of up to 2,000 steps learning to keep the native structure stable. The learned potential matches chemical knowledge and PDB data, can fold and reproduce the dynamics of small proteins, and shows ability in protein design and model scoring applications. Problems in applying differentiable molecular simulation to all-atom models of proteins are discussed along with possible solutions and the variety of available loss functions. The learned potential, simulation scripts and training code are made available at https://github.com/psipred/cgdms

UCL Discovery

PubMed Central

The impact of AlphaFold2 one year on

Author: Jones DT
Thornton JM
Publication venue
Publication date: 01/01/2022
Field of study

The greatly improved prediction of protein 3D structure from sequence achieved by the second version of AlphaFold in 2020 has already had a huge impact on biological research, but challenges remain; the protein folding problem cannot be considered solved. We expect fierce competition to improve the method even further and new applications of machine learning to help illuminate proteomes and their many interactions

UCL Discovery

Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework.

Author: Jones DT
Moffat L
Publication venue
Publication date: 02/07/2021
Field of study

MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

UCL Discovery

PubMed Central

Recent Developments in Deep Learning Applied to Protein Structure Prediction

Author: Greener JG
Jones DT
Kandathil SM
Publication venue
Publication date: 01/01/2019
Field of study

Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result which can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls. This article is protected by copyright. All rights reserved

UCL Discovery